skip to main content
10.5555/954014.954024acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
Article

Differentiating data- and text-mining terminology

Authors Info & Claims
Published:17 September 2003Publication History

ABSTRACT

When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardised. Such a new discipline is text mining. In a groundbreaking paper, <i>Untangling text data mining</i>, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorisation of data- and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge.

References

  1. ALBRECHT, R. AND MERKL, D. 1998. Knowledge discovery in literature data bases. In Library and information services in astronomy III. (ASP conference series, vol. 153.) http://www.stsci.edu/stsci/meetings/lisa3/albrechtrl.html.]]Google ScholarGoogle Scholar
  2. BERSON, A. AND SMITH, S.J. 1997. Data warehousing, data mining, and OLAP. McGraw-Hill, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BIGGS, M. 2000. Resurgent text-mining technology can greatly increase your firm's 'intelligence' factor. InfoWorld 11(2), 52.]]Google ScholarGoogle Scholar
  4. CHEN, H. 2001. Knowledge management systems: a text mining perspective. University of Arizona (Knowledge Computing Corporation), Tucson, Arizona.]]Google ScholarGoogle Scholar
  5. CORNFORD, T. AND SMITHSON, S. 1996. Project research in information systems: a student's guide. Macmillan, Houndmills. (Information system series.)]]Google ScholarGoogle Scholar
  6. HALLIMAN, C. 2001. Business intelligence using smart techniques: environmental scanning using text mining and competitor analysis using scenarios and manual simulation. Information Uncover, Houston, TA.]]Google ScholarGoogle Scholar
  7. HAN, J. AND KAMBER, M. 2001. Data mining: concepts and techniques. Morgan Kaufmann, San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. HEARST, M.A. 1999. Untangling text data mining. In Proceedings of ACL'99: the 37th annual meeting of the association for computational linguistics, University of Maryland, June 20-26 (invited paper). http://www.ai.mit.edu/people/jimmylin/papers/Hearst99a.pdf.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. HOVY, E. AND LIN, C.Y. 1999. Automated text summarization in SUMMARIST. In Advances in automated text summarization. I. MANI AND M.T. MAYBURY, Eds. MIT Press, MA, 81-94. http://www.isi.edu/~cyl/.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. KONTOS, J., MALAGARDI, I., ALEXANDRIS, C. AND BOULIGARAKI, M. 2000. Greek verb semantic processing for stock market text mining. In Proceedings of natural language processing: 2nd international conference, Patras, Greece, June 2000, D.N. CHRISTODOULAKIS, Ed. Springer, Berlin, 395-405. (Lecture notes in artificial intelligence, no. 1835.)]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. LUCAS, M. 1999/2000. Mining in textual mountains, an interview with Marti Hearst. Mappa Mundi Magazine, Trip-M, 005, 1-3. http://mappa.mundi.net/trip-m/hearst/.]]Google ScholarGoogle Scholar
  12. MACK, R. AND HEHENBERGER, M. 2002. Text-based knowledge discovery: search and mining of life-science documents. Drug discovery today 7(11) (Suppl.), S89-S98.]]Google ScholarGoogle Scholar
  13. NASUKAWA, T. AND NAGANO, T. 2001. Text analysis and knowledge mining system. IBM Systems journal 40(4), 967-984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. NEW ZEALAND DIGITAL LIBRARY, UNIVERSITY OF WAIKATO. 2002. Text mining. http://www.cs.waikato.ac.nz/~nzdl/textmining/.]]Google ScholarGoogle Scholar
  15. PERRIN, P. AND PETRY, F.E. 2003. Extraction and representation of contextual information for knowledge discovery in texts. Information sciences 151, 125-152.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. PONELIS, S. AND FAIRER-WESSELS, F.A. 1998. Knowledge management: a literature overview. South African journal of library and information science 66(1), 1-9.]]Google ScholarGoogle Scholar
  17. RAJMAN, M. AND BESANÇON, R. 1998. Text mining: natural language techniques and text mining applications. In Data mining and reverse engineering: searching for semantics, S. SPACCAPIETRA AND F. MARYANSKI, Eds. Chapmann and Hall, London, 50-64.]]Google ScholarGoogle Scholar
  18. ROB, P. AND CORONEL, C. 2002. Database systems: design, implementation, and management, 5th ed. Course Technology, Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. STAIR, R.M. AND REYNOLDS, G.W. 2001. Principles of information systems: a managerial approach, 5th ed. Course Technology, Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. SULLIVAN, D. 2000. The need for text mining in business intelligence. DM Review, Dec. 2000. http://www.dmreview.com/master.cfm.]]Google ScholarGoogle Scholar
  21. SULLIVAN, D. 2001. Document warehousing and text mining: techniques for improving business operations, marketing, and sales. John Wiley, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. THURAISINGHAM, B. 1999. Data mining: technologies, techniques, tools, and trends. CRC Press, Boca Raton, Florida.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. WESTPHAL, C.R. AND BLAXTON, T. 1998. Data mining solutions: methods and tools for solving real-world problems. Wiley, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. ZORN, P., EMANOIL, M., MARSHALL, L. AND PANEK, M. 1999. Mining meets the web. Online 23(5), 17-28.]]Google ScholarGoogle Scholar

Index Terms

  1. Differentiating data- and text-mining terminology

                            Recommendations

                            Comments

                            Login options

                            Check if you have access through your login credentials or your institution to get full access on this article.

                            Sign in

                            PDF Format

                            View or Download as a PDF file.

                            PDF

                            eReader

                            View online with eReader.

                            eReader